PSM: A New Re-Ranking Algorithm for Named-Page
نویسندگان
چکیده
This year, the IR group of ICT participated in the terabyte track named-page Finding subtask for the first time. Since the document collection is as large as about 426G, our most important goal is to find an efficient way to catch the target web page in such a huge size data set. Meanwhile we want to make the indexing and retrieval processing at a reasonable low cost, both on hardware and time-consuming. We used our “FirteX” engine for indexing and retrieval of this task. The indexing time is within 15 hours and the retrieval time is short enough(less than 2 seconds per query). The main contribution of our work is that we design a Pattern Similarity Matching(PSM) re-ranking algorithm to reorder the results and rank the target document as top 1 as possible. We were glad to see that we’ve got an exciting performance on the last year’s (2005) topics during our experiment. The chief procedure of our work can be divided into three parts as below, which are data preprocess, indexing and retrieval, and re-ranking.
منابع مشابه
A New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملIncremental Web Search: Tracking Changes in the Web
A large amount of new information is posted on the Web every day. Large-scale web search engines often update their index slowly and are unable to present such information in a timely manner. In this thesis, we present our solutions of searching new information from the web by tracking the changes of web documents. First, we present the algorithms and techniques useful for solving the following...
متن کاملEfficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages
Information in the internet is evolving in terms of high volume through different sources. Extracting tuples from HTML pages has been an important issue in various web applications such as web data integration, e-commerce market monitoring, and mash ups that repurpose and selectively combine existing web data services. Data Mining is the process of analyzing data from different perspectives and...
متن کاملEfficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages
Information in the internet is evolving in terms of high volume through different sources. Extracting tuples from HTML pages has been an important issue in various web applications such as web data integration, e-commerce market monitoring, and mash ups that repurpose and selectively combine existing web data services. Data Mining is the process of analyzing data from different perspectives and...
متن کاملA Frequency Mining-Based Algorithm for Re-ranking Web Search Engine Retrievals
Conventional web search engines retrieve too many documents for the majority of the submitted queries; therefore, they possess a good recall, since there are far more pages than a user can look at. Precision; however, is a critical factor in these conditions, because the most related documents should be presented at the top of the list. In this paper, we propose an online page re-rank model whi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006